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INTRODUCTION 


Understanding  normal  biology  and  its  pathological  deviations  requires 
comprehending  the  form  in  which  the  different  tissue  components  work  together  in  their 
native  tissue  context.  In  particular,  it  requires  studying  cell-to-cell  and  cell-to- 
environment  interactions,  which  tend  to  be  highly  variable  between  different  tissue  parts. 
This  information  can  only  be  acquired  by  studying  tissue  in  a  state  as  similar  to  its  native 
context  as  possible. 

Contextual  information,  which  is  important  for  understanding  normal  tissue 
behavior,  becomes  essential  when  studying  Breast  Cancer,  since  heterogeneity  and 
three-dimensionality  are  at  the  very  base  of  cancer  initiation  and  clonal  progression. 
However,  many  analytical  procedures  in  biology  start  by  taking  the  element  under  study 
(RNA,  DNA,  etc)  outside  its  native  context.  By  doing  so,  an  accurate  measure  of  that 
element  is  achieved,  but  priceless  information  on  how  that  element  is  related  to  its 
environment  is  lost.  Other  methods  based  on  immunohistochemical  staining  of  thin 
tissue  sections  for  visual  observation  at  the  conventional  microscope  preserve  only 
partial  contextual  information,  since  they  provide  a  flat  -two  dimensional-  view  of  a  three- 
dimensional  object. 

Our  goal  is  to  develop  a  three-dimensional  computer-based  quantitative 
microscopy  platform  to  be  applied  to  simultaneous  morphological  and  molecular  studies 
of  both  normal  and  neoplastic  mammary  tissue.  Using  our  system,  we  will  be  able  to 
reconstruct  relevant  microscopic  tissue  structures  (e.g.  mammary  ducts,  lobules,  lymph 
nodes,  tumor  masses,  etc.)  from  consecutive  thin  tissue  sections.  Then,  using  the  3D 
virtual  reconstruction,  it  will  be  possible  to  perform  quantitative  morphological 
measurements  combined  with  directed  -structure  defined-  high-resolution  analysis  of 
molecular  events. 

To  show  the  use  of  our  system,  we  will  study  the  combined  role  of  Estrogen  and 
Progesterone  Receptors  (ER  &  PR)  in  mammary  gland  development.  We  will  quantify 
cell-to-cell  the  levels  of  ER  and  PR  in  different  parts  of  the  mammary  gland,  at  different 
time  points  of  the  development  of  the  gland. 
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BODY 

Our  accomplishments  during  this  year  (08/01/01-07/31/02)  will  be  described 
following  the  Tasks  enumerated  in  the  approved  proposal.  Those  tasks  are  listed  below, 
and  the  sub-tasks  corresponding  to  the  second  funding  year  have  been  underlined. 
Tasks  completed  during  the  first  year,  including  those  originally  scheduled  for  the 
second  or  third  year,  are  highlighted  in  italic.  The  text  has  references  to  the  numbered 
tasks  (e.g.  see  Task  1.1)  where  the  fulfillment  of  the  tasks  is  explained.  Work  done  to 
improve  last  years  tasks  are  also  referenced. 

Task  1.  (Months  1-12)  Modify  an  existing  microscopic  imaging  system  for  acquiring  low 
magnification  (1  pixel=  5  pm)  images  of  entire  tissue  sections  and  for  tracing  in  3D 
the  ducts  in  the  tissue  specimen  from  a  series  of  images  of  adjacent  sections. 

1.  Complete  the  existing  JAVA  based  software  for  interactive  marking  and  3D 
virtual  rendering  of  ducts  so  that  it  allows  any  branching  pattern.  (Months  1-6) 

2.  Develop  a  customized  VRML  viewer  to  allow  visualization  of  the  branching 
pattern  and  hyperlink  it  to  the  original  images  (Month  6-12) 

Task  2.  (Months  6-30)  Interface  the  existing  acquisition  and  registration  software  with 
the  JAVA  application  to  allow  revisiting  of  acquired  slides  for  inspection  and  high- 
resolution  acquisition  of  areas  of  interest.  (Months  6-12) 

1.  Integrate  the  JAVA  application  with  the  conventional  fluorescence  microscope 
(Months  6-12) 

2.  Integrate  the  JAVA  application  with  the  confocal  microscope  (Month  12-18) 

3.  Integrate  the  high-resolution  images  in  the  VRML  3D  representation  of  the 
mammary  gland  (Months  12-20) 

4.  Integrate  the  image  analysis  methods  for  nuclear  segmentation  with  the  JAVA 
application  (Months  18-24) 

5.  Integrate  the  results  of  the  segmentation  with  the  VRML  representation  of  the 
mammary  gland.  (Month  24-30) 

ACOMPLISHMENTS 

Most  of  the  work  done  this  funding  year  aimed  at  streamlining  the  process  of 
acquiring  and  annotating  large  images  of  entire  tissue  sections,  both  at  low  and  high 
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resolution.  At  low  resolution  we  want  to  extract  tissue  structure  (i.e.  mammary  ducts, 
lymph  nodes).  At  high  resolution  our  goal  is  to  be  able  to  segment  individual  nuclei  and 
detect  and  quantify  the  expression  of  intranuclear  proteins  (e.g.  estrogen  and 
progesterone  receptors). 

At  the  end  of  the  first  year  of  the  grant  (see  last  year  annual  report),  the  system 
that  we  developed  was  able  to  reconstruct  in  3D  parts  of  the  mammary  gland  from 
contours  interactively  drawn  in  the  low-resolution  images.  Basically,  the  user  was  asked 
to  manually  delineate  the  contours  of  ducts  or  lymph  nodes.  To  do  it,  the  system 
provided  with  a  set  of  user-friendly  tools  that  allowed  the  user  to  manually  segment  the 
structures  and  connect  them  between  consecutive  sections.  Although  very  accurate  due 
the  interplay  of  the  human  perception,  this  interactive  method  is  too  labor  intensive,  and 
therefore  not  of  much  use  when  trying  to  reconstruct  extensive  tissue  volumes. 

Segmentation  of  tissue  structure  (Task  1.1, 1.2) 

Automatically  segmenting  large  histological  (H&E  stained)  sections  is  a  very 
challenging  process,  due  to  the  extreme  variability  of  tissue  features  and  scales  across 
the  image,  coupled  to  changes  in  image  quality  due  to  uneven  distribution  of  the  staining 
agent  and/or  changes  in  the  effects  of  the  fixative  or  dehydrating  reagent  in  different 
parts  of  the  tissue.  As  a  consequence,  one  should  not  expect  fixed  intensity  patterns  in 
the  image  that  could  be  used  to  segment  all  parts  of  the  image. 

Image  Preprocessing.  Preprocessing  can  help  correcting  for  some  of  the  non-tissue  or 
protocol  related  problems,  such  as  uneven  illumination.  As  an  example,  Figure  1  shows 
how  we  correct  for  an  uneven  illumination  light  source.  Figure  1A  shows  the  original 
image,  which  is  a  composite  of  multiple  single  field-of-view  images.  The  perturbing 
intensity  variation  within  each  field  of  view  is  highlighted  when  all  individual  images  are 
tiled  together  to  create  the  entire  view  of  the  section. 

By  simply  subtracting  a  background-only  phantom  image  we  can  correct  for  that 
disturbing  effect,  as  seen  in  Figure  1A.  A  closer  look  at  the  correction  is  on  Figures  1C 
and  ID,  which  is  a  small  area  of  the  entire  section. 

Tissue  segmentation  using  scale  space  methods.  After  correcting  the  background,  we 
have  tried  two  different  approaches  for  extracting  tissue  structures  from  the  H&E  stained 
sections.  The  first  method  is  based  in  scale-space  image  analysis  and  is  fully  automatic. 
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First  the  input  image  has  to  be  scaled  using  different  symmetric  Gaussian  kernels.  The 
scaling  consists  on  a  multiple  smoothing  process  with  decreasing  kernel  size  which 
leaves  increasing  levels  of  detail  in  the  image.  Combining  the  results  of  all  scales,  we 
can  detect  objects  of  different  sizes.  Therefore,  the  size  of  the  kernel  defines  the  amount 
of  Gaussian  smoothing  applied  to  the  image,  and  therefore  the  range  of  sizes  of  the 
structures  that  can  be  detected.  In  our  images,  the  largest  kernel,  and  therefore  strong 
smoothing,  is  used  to  detect  lymph  nodes  or  large  sections  of  collecting  ducts,  while 
small  kernels  are  used  to  detect  sections  of  terminal  ducts.  Therefore,  the  scale 
selection  mechanism  (number  of  scale  levels,  maximum  and  minimum  scale  levels)  is  an 
essential  step  prior  to  any  object  detecting  algorithm.  After  filtering,  the  boundaries  of  the 
remaining  objects  are  found  using  a  normalized  Laplacian,  where  the  zero-crossings  are 
the  points  of  maximum  gradient  (i.e.  the  borders)  of  the  original  image. 

Both  filtering  and  border  detection  can  be  combined  in  one  single  step:  Consider 
a  symmetric  Gaussian  function  (f?=f2). 


fix,  y)  =  g(x,  y;ti)-  g{x,  y;t2)= 


1 

r-(4 

i 

r42)i 

^exp 
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|  M  ^ 
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as  a  model  of  objects  in  the  tissue  image.  The  scale-space  representation  L  of  /  is 


L(x,y;t)  =  g(x;t  +  t{)- g(y;t  +  t2).  After  a  few  algebraic  manipulations,  it  can  be  shown 
that  for  any  ti=t2>0,  there  is  a  unique  maximum  over  scales  in  the  normalized  Laplacian, 


(vLn»i)[0;0;<)|  = 


t  *  +  *2  +2  •  *) 

27r(t  +  tl)3/2(t  +  t2) 


\3/2 


When  (ti=t2=t0),  this  maximum  over  scales  is  given  by 

8,M0*)  =  «of=V 

In  summary,  depending  on  the  objects  of  interest  and  their  features,  one  can 
make  use  of  different  scale  set  up  to  delineate  regions  of  interest.  Figure  2  shows  and 
example  of  the  results  obtained  by  this  approach.  Figure  2A,  shows  a  small  part  of  the 
image  of  a  section.  Figure  2B  shows  the  scale-space  maxima  (normalized  Laplacian) 
superposed  on  the  original  image.  It  can  be  appreciated  how  this  algorithm  does  not 
provide  very  good  results  on  very  small  objects,  and  that  the  filtering  process  imposed 
by  the  scale  space  method  introduces  inaccuracy  in  the  definition  of  the  object 
boundaries.  Although  these  results  can  not  be  used  for  a  realistic  and  complete 


7 


reconstruction  of  the  structures  in  3D,  they  can  be  a  perfect  initial  condition  of  more 
refined  segmentation  schema  like  the  one  described  next. 


Tissue  segmentation  using  Level  Sets.  There  is  increasing  interest  in  the  application  of 
partial  differential  equation  (PDE)  morphologically  driven  flows  (i.e.  Level  Set  methods) 
in  image  processing  and  analysis.  A  description  of  the  Level  Set  (LS)  methodology  is 
out  of  the  scope  of  this  report.  Therefore,  a  very  succinct  user-focused  is  provided  next. 
In  a  nutshell,  the  LS  considers  the  image  as  a  force  or  energy  field  determined  by  one  or 
a  combination  of  selected  image  features  (e.g.  intensity,  gradient,  object  curvature, 
distance...).  Then  the  segmentation  of  objects  is  done  by  letting  some  initial  seeds 
manually  placed  on  the  original  image  evolve  under  the  driving  force  of  a  velocity 
function  that  depends  on  the  energy  field.  This  way,  assumed  that  the  right  energy  field 
is  selected,  the  curves  (surfaces  in  3D)  that  define  boundaries  of  the  seeds  will  converge 
in  or  near  the  boundaries  of  the  objects  that  one  wants  to  extract. 

In  the  past  we  had  successfully  used  these  methods  for  edge-preserving  filtering 
and  feature  extraction  in  confocal  microscopy  [Sarti  00,  Ortiz  de  Solorzano  01 ,  Ortiz  de 
Solorzano  02].  Although  the  problem  we  face  now  is  a  much  challenging  one,  we  have 
tried  applying  the  LS  method  to  the  segmentation  of  our  large  histological  sections. 

The  segmentation  process  is  graphically  described  in  Figures  3,  4  and  5.  Figure 
4  shows  the  segmentation  of  part  of  an  H&E  stained  section  from  a  tissue  block  of 
human  ductal  carcinoma  in  situ  of  the  breast  (DCIS).  Figure  5  contains  an  example  of 
segmentation  of  normal  murine  mammary  gland  structure.  We  start  by  interactively 
defining  the  region  of  interest  (ROI)  of  the  image  where  the  segmentation  flow  is  going  to 
be  applied.  This  is  done  by  drawing  a  rectangle  on  the  image  of  the  section  (Figure  3A, 
green  area  selected  in  the  upper  left  image).  Then  the  user  is  asked  to  confirm  the 
parameters  that  will  be  used  in  the  segmentation  of  that  ROI  (Figure  3B).  These  are  the 
parameters  of  the  PDE  that  will  be  solved  and  the  define  the  behavior  of  the  flow  as  a 
function  of  the  image  features.  Some  parameter  tuning  is  required  when  changing  from 
one  type  of  image  to  another  (for  example  from  bright  field  to  fluorescence  or  from 
human  to  mouse  tissue),  or  between  the  same  type  of  image  under  different  acquisition 
conditions.  However,  once  the  optimum  set  of  parameter  has  been  found,  those 
parameters  can  be  used  for  the  entire  set  of  sections  that  compose  a  tissue  block. 
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Once  the  parameters  have  been  confirmed,  the  system  pop-up  a  new  window 
with  the  selected  ROI.  On  that  image,  the  user  is  asked  to  draw  the  seeds  for  the  flow, 
which  can  be  closed  polygons,  or  as  in  the  case  shown  in  Figures  4A  and  5A,  a  few 
points  (see  blue  one-pixel  wide  points  in  Figures  4A  and  5A).  Then  the  user  can  start  the 
flow,  which  in  a  relatively  time  (less  than  a  minute  for  a  1000x1000  image)  will  converge 
to  the  boundaries  of  the  desired  structures  in  the  image  (Figures  4B  and  5B).  Once 
accepted,  the  boundaries  are  incorporated  into  the  original  image  of  the  section,  as 
shown  in  Figures  4C  and  5C.  Small  errors  (Merged  objects,  spurious  objects)  can  be 
then  corrected  from  the  interface,  using  new  and  existing  interactive  tools. 

Using  this  method  we  have  greatly  reduced  the  time  required  to  annotate  the 
cases.  This  task  which  initially  took  40  hours  for  a  case  composed  of  60  sections,  can 
now  be  done  in  8-10  hours,  and  the  types  of  interaction  required  now  is  less  tiresome 
than  the  interaction  initially  required  (drawing  manually  contours).  We  continue  working 
in  methods  to  further  reduce  the  interaction. 

Nuclear  Segmentation  (Task  2.4) 

After  all  tissue  structures  on  the  H&E  stained  sections  have  been  detected  and 
reconstructed  in  3D,  our  goal  is  to  be  able  to  incorporate  molecular  information  at  the 
cellular  level  into  the  3D  rendition.  As  it  has  been  previously  described,  we  use  the  3D 
volumetric  reconstruction  of  the  tissue  to  select  areas  of  interest  to  be  revisited  at  higher 
magnification  on  intermediate  fluorescently  stained  sections.  To  do  so,  we  first  scan  the 
sections  at  low  magnification.  We  then  register  the  fluorescent  sections  with  the 
contiguous  H&E  sections  and  use  the  3D  reconstruction  to  identify  areas  of  interest  that 
are  then  acquired  at  high  magnification  (40X).  These  high  magnification  areas  are 
counterstained  and,  depending  on  the  type  of  information  sought,  immunostained  or  in- 
situ  hybridized.  To  explain  this  process  we  will  use  a  case  were  the  sections,  taken  from 
a  fully  sectioned  biopsy  of  a  patient  with  ductal  carcinoma  in  situ  of  the  breast  (DCIS), 
were  counterstained  with  DAPI  and  hybridized  using  FISH.  We  used  two  probes,  one  for 
the  centromere  of  chromosome  17  and  another  probe  for  the  erbb2  gene. 

The  areas  were  taken  at  40X  through  three  consecutive  scans  with  the  filter 
adapted  to  each  of  the  three  fluorochromes.  Figures  6A,  6G  &  6H  show  the  original 
images  that  we  will  use  to  show  the  segmentation  process.  These  are  the  steps  and 
methods  used: 
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A.  Background  correction:  Before  trying  to  identify  individual  nuclei  in  the 
counterstained  image  (Fig.  6A)  we  start  by  extracting  all  stained  (foreground)  areas 
from  the  unstained  (background)  parts.  Previously  we  smooth  the  image  with  a 
Gaussian  filter  to  reduce  spurious  intensity  peaks.  We  used  a  7x7  Gaussian 

kernel =  0;<r 2  =5).  Then  we  apply  a  background  correction  step  to  compensate 

for  uneven  illumination.  The  algorithm  creates  a  background  map  of  the  image  after 
smoothing  the  image  with  a  large  Gaussian  kernel  (Figure  6B;  Figure  6C  is  a 
contrast  stretched  version  of  6B).  The  map  is  created  by  polynomial  fitting  of  sample 
-equidistant-  points  selected  from  the  filtered  image.  The  algorithm  to  create  the 
background  map  is  as  follows: 

1 .  Select  a  number  of  points  from  the  image  that  are  going  to  be  used  to  create 
the  background  map  (note  brightness  and  location).  In  our  algorithm,  we 
selected  64X64  points  for  1024X1024  sized  images.  The  number  of  points 
grew  proportionally  with  the  size  of  the  images.  For  each  point  we  calculated 
the  average  brightness  of  the  corresponding  neighborhood. 

2.  Construct  a  background  function  using  the  above  values  by  doing  least- 
square  fitting.  The  (m,n)  order  bivariate  polynomial,  the  functional  form  of  the 
background,  can  be  written  as 

B(x,y)  =  amnxm  yn  +....  +  a22*2y2  +«21  x2y1  +al2  +  an-W  +  al0x  +  a0iy  +  a00 

We  used  a  7  degree  fitting.  The  brightness  of  the  64X64  points  (B(x,y))  is 
used  to  calculate  the  seven  fitted  constants  or  seven  coefficients  of  the 
second  order  polynomial  by  least-squares. 

3.  Using  the  coefficients,  a  complete  background  image  B(x,  y)  is  reconstructed. 

4.  The  background  image  is  subtracted  from  the  original  image. 

5.  Resulting  image  is  rescaled  to  occupy  complete  grey-level  spectrum  of  0-255. 

The  result  after  background  subtraction  can  be  seen  in  Figure  6D. 

B.  Separation  of  Foreground  and  Background.  The  smoothed,  background  corrected 
image  is  amplitude  thresholded  at  a  global  mean  intensity  value  /d  and  all  the 

connected  components  in  the  foreground  are  identified  by  component  labeling.  In  the 
second  stage,  the  mean  gray  level  of  each  connected  component  i  is  calculated. 
Then  each  connected  component  i  is  further  thresholded  at  a  unique  threshold 
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value  k-ju^o  try  to  separate  individual  nuclei  within  the  foreground  areas.  The 

tuning  factor  k  is  empirically  set  (default  k  =  0.5).  Figure  6E,  shows  the  result  of 
multi-level  thresholding  on  a  small  part  of  our  test-image.  Then  a  gray  scale 
morphological  operation  (closing+opening  with  a  7  x  7  approximately  circular 
structuring  element)  is  applied  to  the  amplitude  thresholded  image  to  reduce  the 
number  of  holes  within  the  object  along  with  spurious  offshoots  at  the  object  surface. 
Holes  and  offshoots  are  the  result  of  improper  staining  or  most  commonly  to  non- 
homogeneous  chromatin  distribution  within  the  nucleus  that  renders  some  lightly 
stained  areas  which  can  not  be  extracted  by  the  amplitud  thresholding  algorithm 
previously  described. 

C.  Enhancing  Desired  Concavities:  The  previously  described  object  surface  smoothing 
step  reduces  the  fragmentation  of  the  nucleus  during  automatic  segmentation. 
However,  at  the  same  time  it  can  eliminate  genuine  concavities  between  overlapping 
and/or  touching  objects.  To  enhance  those  necessary  concavities  the  local  gradient 
magnitude  image  is  obtained  by  calculating  the  intensity  gradient  of  the  smoothed 
image  and  thresholding  at  the  average  gradient  magnitude.  Primary,  secondary  and 
tertiary  local  gradient  peaks  in  the  gradient  image  are  retained.  Primary  gradient 
peak  is  the  pixel  with  a  maximum  gradient  magnitude  in  a  3x3  neighborhood 
operation  on  gradient  image.  The  secondary  peak  and  tertiary  peaks  are  the  second 
and  third  maximum  gradient  magnitude  values  in  each  3x3  neighborhood.  The 
skeleton  of  the  gradient  peak  image  provides  an  approximate  boundary  of  the 
objects  where  discontinuities  are  present  where  cells  touch  one  another.  This 
apparently  adverse  effect  can  be  used  to  enhance  the  concavity  of  the  cell  nuclei 
surface  by  converting  those  pixels  which  correspond  to  local  gradient  peaks  into 
background  pixels,  to  create  a  deeper  concavity,  where  cell  nuclei  touch  or  appear  to 
overlap  on  one  another. 

D.  Separation  of  Touching  Nuclei.  The  foreground  of  the  two-tone  (binary)  image 
obtained  from  previous  step  is  subject  to  a  thinning  process.  Thinning  is 
implemented  through  iterative  eroding  of  the  boundary  pixels.  Ideally,  the  process 
converges  until  a  unique  signature  is  obtained  for  each  cell  nuclei  in  the  region  of 
interest.  In  practice,  after  the  first  iteration  we  check  every  signature  for  its  size.  If  a 
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signature  consists  of  a  few  pixels  (in  our  experiments  this  minimum  threshold  was 
empirically  set  to  ten  pixels  by),  such  a  signature  is  discarded  assuming  that  is  a 
noisy  signature  formed  due  to  offshoots  in  the  nucleus  surface.  This  is  done  only  in 
the  first  step  of  erosion.  Then  both  a  minimum  and  maximum  object  size  are  set.  In 
our  case  we  used  eleven  and  five-hundred  pixels  respectively.  Any  signature  falling 
between  those  limits  is  considered  a  unique  signature  of  its  cell  nucleus.  Such  a 
signature  is  not  subject  to  further  thinning.  The  signature  image  is  processed  further 
by  a  morphological  closing  operator  (7x7  structuring  element  with  an  effective 
circular  kernel  shape  is  used).  This  to  some  extent  forces  the  signatures  to  have  a 
circular  shape.  In  the  second  step,  the  cell  signatures  are  subject  to  controlled 
dilation.  The  signatures  are  grown  into  its  neighboring  background  pixels  under 
certain  conditions. 

The  signatures  are  then  grown  into  its  neighboring  background  pixels  under 
certain  conditions: 

1.  Two  signatures  or  more  signature  are  not  allowed  to  overlap. 

2.  Signatures  are  grown  only  into  its  immediate  neighborhood  background  pixels. 

3.  A  signature  can  not  be  grown/dilated  more  than  (1  +  number  of  iterations  that 

signature  was  eroded). 

4.  The  growing  process  is  terminated  when  the  grown  region  covers  all  the 
foreground  pixels  in  the  original  two-tone  image. 

If  IT  is  the  two-tone  image  and  ID  is  the  dilated  image  with  each  signature  having 
its  own  unique  label,  then/51  =IT  a  Id  gives  an  image  where  most  of  the  touching 

cell  nuclei  are  isolated.  This  segmentation  is  neither  complete  nor  accurate.  There 
are  many  fragmented  nuclei  due  to  the  formation  of  more  than  one  signature  per  cell. 
At  the  same  time  there  might  be  few  objects  clustered  nuclei  that  are  not  segmented 
due  to  failure  in  finding  a  unique  signature  for  each  nucleus  in  the  cluster.  Thus,  it  is 
necessary  to  recognize  isolated  individual  nuclei  in  the  segmented  image.  Therefore 
the  next  step  of  segmentation  is  applied  only  on  clusters.  Now  the  isolated  cell  nuclei 
are  recognized  based  on  its  relative  size  and  intensity  features.  The  relative  object 
size  and  relative  object  intensity  of  each  individual  object  in  /51  are  calculated.  All 

the  objects  with  relative  mean  object  intensity  less  than  0.3  are  considered  as 
artifacts  and  eliminated  (this  threshold  is  set  empirically,  but  it  works  satisfactorily  in 
almost  every  example).  Objects  with  relative  size  above  1.3  and  below  0.7  are 
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flagged  off  for  the  second  stage  of  the  cluster  segmentation.  Relative  size  of  the 
objects  rv  is  defined  as  the  ratio  of  the  size  of  the  object  to  the  average  size  of  all 


the  objects  in  the  image.  If  the  average  size  of  object  i  is  Vt ,  then  rv.  = 


V, 
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where  p  is  the  number  of  isolated  objects  present  in  751 .  The  relative  intensity  of  the 
objects  rj.  is  defined  as  the  ratio  of  the  average  intensity  of  the  foreground  pixels  of 
the  object  to  the  average  intensity  of  foreground  pixels.  If  average  intensity  of  object 


i  is  7j ,  then  rj  = 
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,  where  N f  is  the  number  of  foreground  pixels  in  JS\- 


The  rest  of  the  cell  nuclei  in  the  image  are  subjected  next  step  of  segmentation. 
This  segmentation  step  involves  only  those  objects  which  are  flagged  off  for  further 
processing  based  on  their  relative  size  feature.  Let  7$2  be  an  image  with  signatures 

of  such  objects.  All  the  objects  in  the  image  that  share  common  boundaries  are  given 
a  same  label.  This  merges  most  of  the  fragmented  cell  nuclei  into  one  object.  The 
resultant  image  is  then  passed  through  relative  size  filter  for  isolating  possible  single 
cells  formed  by  merging.  The  rest  of  the  image  is  processed  using  the  watershed 
algorithm  [Beucher  92], 

The  path  generated  distance  transform  proposed  by  Borgefors  [Borgerfors  86]  is 
first  applied  on  image  ISi .  Identification  of  flat  /  homogeneous  regions  in  the 

distance  map  and  rescaling  the  distance  values  of  those  pixels  to  reduce  flat  fields 
inside  the  reconstructed  grey  object  was  found  to  improve  the  performance  of 
watershed  techniques.  The  watershed  algorithm  on  a  reconstructed  grey  image  can 
be  described  in  a  few  steps.  Let  dist(.)  represents  the  distance  value  of  pixels  in  the 
distance  map. 

Step  1:  All  the  connected  groups  of  pixels  having  a  maximum  distance  in  the 
image  domain  are  considered  markers.  It  may  be  a  single  pixel,  a  group  of 
connected  pixels  or  several  groups  of  connected  pixels.  The  markers  are 
labelled  and  stored  in  as  a  marker  image.  Let  be  the  maximum  distance 
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in  the  image  domain,  dnext  the  next  maximum  distance  level  and  ^the 
minimum  distance  value. 

Step  2:  Pixels  having  a  distance  value  (dnext)  and  located  in  the  neighbourhood 
of  the  labelled  regional  markers  are  merged  with  their  neighbouring  regional 
marker.  The  isolated  pixel  or  group  of  connected  pixels  with  distance  dnext 

and  not  having  a  labelled  regional  marker  in  their  immediate  neighbourhood 
are  considered  as  new  markers  and  given  a  new  unique  label. 

Step  3.  dmax  —  dnext 

Step4:  dnext  =  next  maximum  distance  value  in  the  image 

Step  4:  If  the  dmx  *  d^n  then  steps  2,  3  and  4  are  repeated. 

The  resulting  image  IS2  is  filtered  using  size  filters.  From  Image  IS1,  we  have 

calculated  size  threshold  values  for  this  filter.  All  the  objects  that  fall  below  the  size 
threshold  limit  are  considered  as  fragments  and  merged  to  nearest  larger  object.  If 
the  fragment  is  connected  to  more  than  one  object,  then  it  is  merged  with  that  object 
with  which  it  shares  larger  common  boundary.  Objects  which  are  above  the 
maximum  size  limit  are  flagged  for  interactive  correction.  In  our  experiment,  we  have 
not  come  across  any  case  where  we  had  to  do  interactive  correction. 

E.  Improving  the  accuracy  of  segmentation  by  boundary  search.  The  accuracy  of  the 
segmentation  obtained  by  the  above  sequential  combination  of  different  techniques 
depends  on  the  accuracy  of  the  thresholding  during  initial  processing  stages. 
Moreover,  the  shape  of  the  objects  is  influenced  by  the  structuring  elements  used  in 
the  morphological  operations.  This  is  because  the  separation  of  connected  objects  is 
not  governed  by  gradient  peaks  but  by  the  concavity  at  the  surfaces  where  they 
touch  one  another.  Thus  the  boundary  of  the  cell  nuclei  obtained  by  the  above 
process  may  not  depict  actual  boundary  location.  Therefore  we  have  to  identify  those 
boundary  segments  which  are  common  to  more  than  one  cell  nucleus.  This  is  done 
by  searching  the  eight-neighborhood  of  the  boundary  pixels.  If  the  boundary  pixel 
has  at  least  one  neighboring  pixel  in  background,  that  boundary  pixel  belongs  to  the 
external  boundary  of  the  cluster  of  cells.  Edge  pixels  that  do  not  have  any 
background  pixels  in  its  immediate  neighborhood  are  considered  as  the  segments  of 
the  nucleus  boundary  that  separates  touching  or  clustered  cell  nuclei.  Each 
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boundary  segment  that  separates  the  touching  nuclei  or  cell  clusters  are  uniquely 
labeled  for  further  processing.  In  improving  the  actual  boundary  detection,  the 
algorithm  should  also  serve  to  detect  noisy  boundary  segments  that  may  be  dividing 
single  object  into  two. 

A  small  neighborhood  of  the  pixels  of  the  labeled  boundary  segments  is 
searched  for  a  high  gradient  peak.  We  have  used  a  neighborhood  of  four  pixels  each 
on  either  side  of  the  boundary  pixel  along  the  direction  of  the  intensity  gradient,  for 
searching.  There  are  two  methods  for  searching  the  pixels  along  the  normal  vector 
namely  Basic  line  search  and  Stratified  line  search.  The  stratified  search  technique 

m 

1 

breaks  the  search  region  St  into  disjoint  segments  of  length  I,  Si  =  Y  S^-  where, 

j  =  1 

v.  =  v^  +  (lj  +  k)-hi;  k  =  -1,0,1 . assumin9  1  isodd- 


Once  the  smaller  region  containing  the  optimum  edge  pixel  is  found,  then  a  basic 
line  search  strategy  can  be  applied  to  select  the  most  appropriate  edge  pixel  within 
the  region.  For  each  pixel  in  the  initial  boundary  Vj  where,  i=  0,  1 ,  2,....,  basic  search 

n 

technique  restricts  the  search  in  the  region  S  =  Y Sj  where  Sj  contains  voxels 

/  =  1 

on  the  normal  vector  h{ , 

St  =  jv;  =Vj ■  +k-hi‘,  k  =  -^Y^,...,~  1,0,1, ....,+^p^j  assuming  that  m  is 

odd  without  the  loss  of  generality.  Considering  a  search  three  pixels,  basic  line 
search  has  the  computational  complexity  o(«w3)  and  stratified  line  search  has  the 


complexity  of  O 


.  A  boundary  segment  is  considered  genuine  if  at  least  -rd 


of  the  pixels  constituting  that  boundary  segment  correspond  to  local  gradient  peak 
that  is  above  average  gradient  magnitude.  One  can  device  many  other  conditions  to 
determine  noisy  boundary  segment.  In  the  present  case,  a  simple  scoring  as 
mentioned  above  has  given  acceptable  results.  Figure  6F,  shows  the  final  result  of 
marking  all  cell  nuclei  detected  using  the  overall  scheme  described  above 
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Fish/gene  expression  quantification  (Task  2.4) 

To  identify  genetically  aberrant  cells  we  are  counting  the  number  of  fluorescent 
signals  present  in  the  cell  nuclei  or  the  integrated  fluorescence  intensity  in  the  nucleus 
area.  We  are  using  a  probe  to  the  centromere  of  chromosome  17  as  a  reference  to 
enumerate  the  erbbB2  gene  copy  number,  which  we  consider  an  indicator  of 
malignancy.  Figures  6G  and  6H  show  the  FITC  (ctr.  17)  and  CY3  (erbB2)  images  where 
the  FISH  segmentation  algorithm  is  applied  for  our  sample  image.  A  reasonable  method 
to  detect  FISH  signals  and  to  determine  their  parameters  should  be  translation,  scaling 
and  rotation  invariant  and  should  be  able  to  detect  the  range  of  parameters  of  the  signal. 
The  accuracy  by  which  the  parameters  are  determined  must  be  as  accurate  as  the  level 
of  noise  permits.  A  simple  algorithm  for  detecting  the  signals,  which  satisfies  the  above 
mentioned  conditions,  is  to  locally  threshold  the  image  at  an  appropriate  level  and 
characterize  each  signal  by  using  its  intensity,  size  and  shape  property  to  distinguish  it 
from  noise.  The  Top-hat  filter  is  best  suited  to  enhance  ‘spot’  like  structures  in  the  image. 
The  region  of  interest  for  counting  the  FISH  signals  is  only  within  the  cell  nuclei.  For  this 
purpose  the  segmented  and  labeled  image  obtained  following  the  algorithm  described  in 
the  previous  section  is  virtually  superposed  on  each  FISH  signal  channel.  Regions 
outside  the  cell  nuclei  and  regions  of  the  truncated  cell  nuclei  are  discarded.  All  groups 
of  connected  pixels  in  FISH  signal  channel  that  are  inside  nuclei  regions  are  examined 
to  determine  whether  they  are  FISH  signals  or  artifacts.  The  following  processing  steps 
are  used  to  enhance  FISH  signals,  discard  noise  and  to  analyze  FISH: 

1 .  Background  correction  by  second  degree  polynomial  fit 

2.  Global  Top-hat  filtering  by  subtracting  a  morphologically  opened  and  further 
smoothed  version  of  the  FISH  image  from  the  original  image.  This, 
theoretically,  result  in  reduction  of  dominant  background  haze. 

3.  Local  Top-hat  filtering  to  enhance  each  FISH  spot 

4.  Local  thresholding  to  detect  the  FISH  spots 

5.  Component  labeling  within  each  cell  nucleus  domain  and  determination  of 
FISH  spot  features  such  as  size  (in  pixels),  maximum  and  average 
brightness,  relative  brightness  etc. 

6.  Elimination  of  those  spots  which  do  not  confirm  to  accepted  relative  features 
of  FISH  signal  such  as  relative  size  and  relative  intensity 

7.  Re-label  the  FISH  spots  in  each  cell  nucleus  domain 
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8.  Label  the  cells  in  tissue  image  with  a  label  equivalent  to  number  of  FISH 
signals  present  in  the  cell  (Figures  6H  and  61) 

9.  Calculate  statistics  of  FISH  amplification/distribution  across  the  cells  in  the 
same  image,  across  different  images  in  a  same  section  and  across  different 
sections  constituting  the  same  structure  such  as  duct  or  tumor  in  the  data  set. 

Automatic  Registration  of  tissue  sections  (in  progress,  Tasks  1.1, 1.2) 

Manually  registering  each  pair  sections  of  a  case  to  achieve  a  smooth  reliable 
reconstruction  of  all  tissue  structures  requires  a  considerable  time  investment.  We  are 
implementing  a  method  for  automatically  registering  the  sections  that  uses  the 
"Hierarchical  Chamfer  Matching  Algorithm"  (HCMA)  on  each  pair  of  neighboring  images. 
Developed  by  G.  Borgefors  [Borgefors  86],  this  method  [Hult  96]  projects  a  contour  area 
of  the  image  to  be  registered  on  the  distance  transform  of  the  reference  image.  This  is 
done  at  different  positions  (both  translating  and  rotating  the  contour  image).  At  each 
position,  a  sum  is  calculated  which  is  the  sum  of  the  different  values  of  the  pixels  in  the 
distance  image  which  are  overlap  with  locations  there  is  a  contour  in  the  original  image. 
A  perfectly  matched  image  would  then  give  zero  as  its  result,  since  in  distance  images 
edges  have  a  value  of  zero.  When  the  sum  is  minimized  the  best  position  is  found. 

To  save  time,  a  multiresolution  approach  is  used.  We  start  applying  the  algorithm  in 
multiple  neighborhoods  of  a  subsampled  version  of  the  original  image.  This  gives  us  an 
initial  estimate  of  the  optimum  registration  that  we  can  iteratively  refine  at  increasing 
levels  of  resolution.  In  each  iteration,  those  points  found  to  be  not  sufficiently  informative 
are  eliminate  for  further  computation.  The  search  is  repeated  until  the  original  resolution 
level  is  reached.  On  this  level  the  most  sensitive  matching  is  performed. 

In  addition  to  our  work  in  the  automatic  registration,  a  ‘lock  zoom  area’  option  has 
been  added  to  the  software.  When  using  it,  the  zoom  areas  of  the  two  images  of  the 
sections  which  can  be  seen  on  our  interface  can  be  ‘locked’,  and  therefore,  providing 
that  the  sections  are  properly  registered,  focus  on  the  same  area  of  the  sections. 

Other  Improvements 

Other  less  visible  though  equally  important  improvements  have  been  implemented, 
within  that  overarching  goal  of  speeding  up  the  acquisition,  registration,  annotation  and 
analysis  of  the  images.  These  are  some  of  the  improvements: 


17 


To  reconstruct  the  cases  in  3D,  our  rendering  algorithm  calculates  an  optimized 
Delaunay  triangularization  from  the  boundaries  of  the  structures  (ducts,  tumors) 
manually  or  semiautomatically  extracted  from  the  sections.  Initially,  the 
triangularization  was  calculated  every  time  the  case  was  reconstructed,  even  when 
using  the  same  parameters  (number  of  sections  rendered,  selected  structures 
rendered)  as  in  previous  renderings.  To  speed  up  the  time  required  to  render  each 
case,  we  now  store  the  results  of  the  triangularization  performed  the  first  time  a  case 
is  rendered.  We  also  incrementally  store  the  results  of  new  additions  to  the  case 
every  time  the  case  is  re-rendered  with  new  parameters.  This  way  we  can  reuse  or 
build  on  top  of  them  the  following  time  the  case  needs  to  be  reconstructed.  By  dong 
this,  we  have  reduced  seconds  the  time  required  for  second  and  plus  rendering  of 
the  cases  from  several  minutes  to  approximately  30  seconds.  (Task  1.1) 

As  described  in  last  year’s  report,  the  areas  to  be  acquired  at  high  magnification  can 
be  selected  by  drawing  a  rectangle  that  includes  the  areas  in  the  low  magnification 
image  of  the  section  taking.  Ideally,  multiple  areas  distributed  across  one  section 
could  be  thus  selected  and  background  imaged  overnight.  In  practice,  the  quality  of 
the  image  (degree  of  focus  of  the  images)  was  very  poor  for  second  and  subsequent 
areas  when  the  first  two  areas  were  located  far  apart  in  the  section.  This  is  due  to  the 
automatic  focusing  process,  which  used  as  a  reference  one  initial  manually  focused 
point,  located  in  the  first  area.  To  overcome  this  limitation  we  have  updated  the 
software  so  that  it  will  briefly  visit  all  the  areas  that  are  going  to  be  acquired  in  batch 
mode,  asking  the  user  to  focus  only  once  in  each  area.  The  system  now  stores  those 
focus  values  and  used  them  when  acquiring  each  area.  This  way,  using  a  very  small 
amount  of  user  interaction,  the  quality  of  the  images  has  greatly  improved. 
Furthermore,  some  software  changes  were  required  to  be  able  to  handle  the 
acquisition  of  multiple  images  in  batch  mode.  Initially  all  acquired  images  were  kept 
in  main  memory  until  accepted  and  added  to  the  case  they  belong  to.  This  rapidly 
exhausted  the  computer  resources  when  more  than  two  or  three  average  size 
multicolor  images  were  taken.  Now  we  store  the  images  in  temporary  files  to  be 
retrieved  by  the  user  when  he/she  is  ready  to  accept  the  images  and  add  them  to  the 
case.  (Task  2.3) 
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Revisiting  areas  of  interest  at  high-magnification  is  done  normally  after  acquiring  and 
reconstructing  the  case  at  low  resolution.  This  means  that,  when  reacquiring  the 
areas  of  interest,  the  slide  has  to  be  place  under  the  microscope.  Since  this  is  done 
by  the  microscope  operator,  there  is  always  a  slight  difference  between  the  new 
position  of  the  slide  and  the  one  it  had  the  time  when  the  low  resolution  image  was 
taken.  Therefore,  a  registration  step  is  required  here,  which  is  done  by  interactively 
identifying  pairs  of  points  under  the  microscope  and  on  the  image,  to  calculate  the 
shift.  (Tasks  1.1,  2.3) 

The  nuclei/gene  expression  segmentation  of  the  areas  of  interest  previously 
described  have  been  integrated  into  R3D2  (the  reconstruction  software),  to  allow  the 
background  analysis  of  all  areas  acquired  in  one  case.  This  way  we  can,  for 
example,  start  the  nuclear  segmentation  and  FISH  gene  enumeration  in  all  the  areas 
taken  from  all  the  sections  of  a  case.  After  manually  defining  the  parameters  for  all 
segmentation  steps,  the  software  will  automatically  analyze  all  the  areas  of  a  case. 
The  output  of  the  analysis,  although  spatial  statistical  analysis  is  its  way,  are  color 
coded  images  that  represent,  for  each  nucleus  in  the  image-  the  number  of  FISH 
signals  (or  0  or  1  when  dealing  with  nuclear  gene  expressio).  These  color  coded 
images  can  be  invoqued  from  the  3D  reconstruction  of  the  case,  as  already 
explained  for  the  original  images.  (Task  2.5) 

The  3D  reconstruction  of  the  case  is  now  more  interactive,  in  that  now  individual  as 
well  as  groups  of  volumes  (ducts,  tumors,  etc)  can  be  selected.  Then,  unselected 
objects  can  be  removed  from  the  scene  to  be  able  to  have  a  better  look  at  the 
selected  volumes.  We  have  also  incorporated  some  new  interactive  tools  that  allow 
merging  of  volumes.  This  is  very  important  when,  due  to  missing  or  torn  sections,  the 
native  structures  can  not  be  rendered  completely  from  the  manual  or  semi-automatic 
annotations.  Finally,  the  opacity  of  the  volumes  can  be  change  in  real  time  without 
having  to  re-render  the  entire  case.  Reduced  opacity  can  help  understanding  the 
spatial  distribution  of  all  the  elements  of  the  scene.  (Task  1.2) 

Hollow  structures  can  now  be  rendered  in  3D  (Task  1 .2) 
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New  icons  have  been  added  to  the  2D  and  3D  visualization  windows  to  allow  faster 
interaction  with  the  software  (Task  1.1.) 

A  new  option  now  allows  opening  a  selected  range  of  sections  of  the  case  or  not 
even  a  section,  while  still  allowing  rendering  any  range  of  sections  in  3D.  This 
speeds  up  the  use  of  the  software  when  one  only  wants  to  render  the  case  in  3D, 
without  having  to  load  the  entire  case  (Task  1.2) 
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PROBLEMS 


The  main  problems,  and  how  we  are  dealing  with  them  has  been  described  in  the 
previous  chapter.  The  only  scheduled,  non-accomplished  among  Tasks  1  and  2  is  Task 
2.2.  The  integration  of  the  system  with  the  confocal  microscope  has  been  delayed  due  to 
the  work  required  to  optimize  the  existing  code  and  the  perception  that  in  fact,  the 
integration  is  not  essential  for  the  fulfillment  of  this  project’s  goals. 

Although  Tasks  1  and  2  have  not  been  affected  by  it  as  much  as  Task  3,  the  overall 
progress  of  the  project  has  slowed  down  slightly  due  mainly  to  the  Pi’s  approved  leave 
of  absence  during  the  months  of  February  to  June  to  fulfill  a  preacquired  teaching 
commitment.  Although  a  new  postdoc  came  on  board  in  January  to  compensate  in  part 
for  this  fact,  the  necessary  adaptation  to  the  new  place  and  the  goals  of  the  project 
prevented  the  new  postdoc’s  input  to  completely  compensate  the  absence  of  the  PI. 
However,  we  believe  that  we  can  accomplished  the  goals  of  the  project  within  the 
funding  period  plus  perhaps  a  short  (6  months)  non-funded  extension  starting  at  the  end 
of  the  last  funding  year. 
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Task  3.  (Months  6-36)  Use  #1  &  #2  to  perform  3-D  reconstruction  maps  of  the 
distribution  of  ER  and  PR  in  the  ductal  tree  and  determine  whether  ER  colocalizes  with 
PR  in  mouse  mammary  epithelium  during  critical  phases  of  mammary  development. 

1.  Select  the  animals  (20)  to  study,  taken  at  different  stages  of  development. 
Extract,  fix  and  section  the  tissue  as  explained  in  the  Methods  Section  (Months 
6-12,4  specimens;  Months  12-24,  8  specimens.  Months  24-36,  8  specimens) 

2.  Reconstruct  the  mammary  gland  using  the  software  developed  according  Task  1. 
(Months  12-24.  5  specimens;  Months  24-36,  15  specimens) 

3.  Revisit  areas  of  interest  for  high  resolution  imaging  and  detection/colocalization 
of  Progesterone  and  Estrogen  receptors)  (Months  24-36,  20  sections)  Integrate 
the  information  (Months  30-36) 

During  the  reporting  period  we  should  have  imaged  and  reconstructed  5  of  the  20 
selected  glands.  In  fact,  due  to  the  already  mentioned  delay  and  the  work  in  improving 
the  system,  only  three  glands  have  been  imaged  and  none  of  them  completely 
reconstructed  yet.  However,  since  now  the  system  is  at  a  stage  were  an  entire  case  can 
be  imaged  and  reconstructed  in  a  week,  we  believe  that  by  the  end  of  the  year  we  will 
have  reconstructed  all  the  proposed  glands. 
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KEY  RESEARCH  ACCOMPLISHMENTS 


During  this  year  we  have  continued  working  on  the  improvement  of  the  system  for 
acquisition  and  reconstruction  of  entire  tissue  blocks,  and  we  have  worked  on  ways  to 
incorporate  the  molecular  analysis.  Namely, 

•  Our  work  in  several  aspects  of  the  software  has  substantially  reduced  the  time 
required  to  image,  annotate  and  reconstruct  the  tissue  structures,  from 
approximately  a  month  to  a  week. 

•  The  high  resolution  images  of  areas  of  interest  can  now  be  easily  acquired 
directly  from  the  JAVA  interface.  Those  areas  can  also  be  now  invoked  from  the 
3D  reconstruction  of  the  tissue. 

•  We  have  almost  completed  the  software  that  can  segment  all  the  nuclei  and 
quantify  FISH  signals  or  gene  expression  from  the  fluorescent  high-magnification 
areas  of  interest.  The  results  of  the  analysis,  although  the  statistical  spatial 
analysis  is  under  way,  can  be  visualized  from  the  3D  reconstruction  of  the  tissue 
linked  to  the  original  images. 
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REPORTABLE  OUTCOMES 


Manuscripts: 

•  “Recent  advances  in  quantitative  digital  image  analysis  and  applications  in  Breast 
Cancer.  Ortiz  de  Solorzano  C.,  Callahan  D.E.,  Parvin  B.,  Costes  S.,  Barcellos-Hoff, 
M.H.  Review  paper  accepted  for  Microscopy  Research  and  Technique. 

•  “A  geometric  model  for  image  analysis  in  cytology”  Ortiz  de  Solorzano  C.,  R.  Malladi, 
Lockett  S.  In:  Geometric  methods  in  bio-medical  image  processing.  Ravikanth 
Malladi  (Ed.).  Springer  Verlag  2002,  pp.  19-42. 

•  “ A  system  for  combined  three-dimensional  morphological  and  molecular  analysis  of 
thick  tissue  samples”  Fernandez-Gonzalez  R.,  Jones  A.,  Garcia-Rodriguez  E„  Chen 
P.Y.,  Idica  A.,  Barcellos-Hoff  M.H.,  Ortiz  de  Solorzano  C.  Accepted  for  Microscopy 
Research  and  Technique. 


Presentations: 

•  A  system  for  computer-based  reconstruction  of  3-dimensional  structures  from  serial 
tissue  sections:  an  application  to  the  study  of  normal  and  neoplastic  mammary  gland 
biology.  Microscopy  and  Microanalysis’01 ,  Long  Beach,  CA  August  5th-9th,  2001 . 
Platform  presentation. 

•  "3D  Histo-Pathology:  towards  a  morphological  characterization  of  ductal  carcinoma  in 
situ  of  the  breast "  Annual  Meeting  of  the  American  Association  for  Cancer  Research 
(AACR).  San  Francisco,  CA,  April  4-9,  2002. 


Informatics: 

•  As  described  in  the  Body  of  the  report  and  in  the  Reportable  Outcomes  sections,  we 
have  developed  and  integrated  new  methods  to  automatically  extract  histological 
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information  from  tissue  sections,  as  well  as  morphological  and  molecular  information 
at  the  cellular  level. 


Funding  obtained: 

•  Segmentation  of  Mammary  Gland  Ductal  Structure  Using  Geometric  Methods.  P.l.  s 
Malladi  R.  and  Ortiz  de  Solorzano  C.  Granted  by  the  LBNL  Laboratory  Directed 
Research  and  Development  Program  (LDRD),  in  the  Strategic-Computational  Sub- 
Program.  Period  Oct  2001-  Sept  2003 

•  Characterization  of  Adult  Stem  Cell  Involvement  in  Mammary  Gland  Development. 
PI:  Dr.  Carlos  Ortiz  de  Solorzano  Funded  by:  LBNL  Laboratory  Directed  Research 
and  Development  Program  (LDRD).  Period  Oct  2002-Sept  2004 

•  Three-dimensional  Modeling  of  breast  cancer  progression  PI:  Dr.  Carlos  Ortiz  de 
Solorzano.  Funded  by:  University  of  California,  Breast  Cancer  Research  Program 
Grant  Number  -  8WB-0150 


Employment  or  Research: 

•  Based  on  the  successful  performance  of  the  PI  as  a  Scientist  during  the  last  year,  he 
has  been  offered  a  Staff  Scientist  Position  at  the  Life  Sciences  Division,  Lawrence 
Berkeley  National  Laboratory  of  the  University  of  California. 


•  This  grant  continues  supporting  Mr.  Rodrigo  Fernandez-Gonzalez,  a  Ph.D.  candidate 
in  the  joint  UC  Berkeley-UC  San  Francisco  Program  in  Bioengineering.  Rodrigo 
continues  working  with  me  part  time  as  a  Graduate  Student  Research  Assistant. 

•  Half  way  through  the  reporting  period,  Dr.  Umesh  Adiga,  a  Ph.D.  in  Computer 
Sciences,  joined  my  lab  as  a  postdoctoral  fellow  to  work  on  the  image  analysis 
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involved  in  the  automation  of  the  segmentation  of  nuclei  and  FISH  signals,  as  well  to 
other  image  analysis  and  processing  tools  required  for  this  project. 

•  Mr.  Adam  Idica,  an  Integrated  Biology  undergraduate  student  at  UC  Berkeley 
continues  providing  invaluable  assistance  in  the  acquisition  and  annotation  of  the 
tissue  specimens. 
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CONCLUSIONS 


In  summary,  the  goals  for  the  reporting  year  have  been  mostly  accomplished.  On 
one  side  we  have  kept  improving  our  three-dimensional  computerized  microscopy 
platform  for  imaging  and  3D-reconstruction  of  tissue  blocks.  Most  of  the  work  done  was 
directed  towards  reducing  the  amount  of  time  and  interaction  required  for  imaging, 
annotating  and  reconstructing  the  cases.  The  time  for  reconstruction  a  case  has  been 
reduced  from  a  month  down  tone  or  two  weeks  approximately  (depending  on  the  extent 
of  the  tissue  block,  the  number  of  sections  used  for  the  reconstruction,  the  density  of 
tissue  structures  within  the  block,  etc).  On  top  of  developing  new  fully  automated  or  less 
interactive  methods  for  image  registration  and  annotation,  some  practical  issues  related 
to  computer  memory  usage  and  handling  of  large  images  have  been  successfully  solved 
by  optimizing  the  code,  increasing  the  computer’s  memory  and  the  bandwidth  of  our 
internal  laboratory  computer  network. 

In  parallel,  we  have  worked  in  the  automation  of  nuclear  and  FISH/gene 
expression  detection  on  high-magnification  areas  of  interest,  for  instance  tumor  areas  or 
surrounding  normal  tissue.  These  areas  are  normally  fluorescently  stained  with  a  nuclear 
counterstain  (e.g.  DAPI)  and  either  multicolor  DNA-FISH  or  immunostaining  of  proteins 
expressed  inside  the  nucleus.  At  this  point  we  have  a  prototype  of  a  segmentation 
algorithm  that  can  be  run  in  batch  mode  to  process  all  areas  of  a  case.  Processing  starts 
by  segmenting  all  individual  nuclei  in  the  counterstained  image.  Then  the  system 
analyzes  the  other  image  channels,  taken  with  filter  adapted  to  the  staining 
fluorochromes  used.  Analysis  consists  of  enumerating  FISH  signals  to  determine  gene 
copy  number  of  quantify  the  amount  of  protein  expression  to  determine  whether  that 
particular  cell  is  positive  for  that  protein.  This  process  can  be  done  in  background  mode, 
overnight  without  human  interaction  to  provide  molecular  information  of  hundreds  of 
thousands  of  cells  that  make  up  the  tissue. 

Exciting  and  powerful  as  it  is,  we  are  facing  several  problems  too,  such  as  how  to 
meaningfully  extract  genetic  (or  protein  expression  information)  for  largely 
heterogeneous  cell  populations.  New  nuclei  classifiers  are  being  developed  to  be  able  to 
distinguish  between  for  instance  epithelial  cells  and  fibroblast,  based  on  their  distinct 
morphology  of  their  nuclei.  Similar  efforts  are  in  their  way  to  classify  epithelial  versus 
other  types  of  compartments  using  morphological  parameters  at  the  tissue  level  (i.e. 
using  the  entire  images). 
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Data  handling  is  another  challenging  problem  that  includes  image  storage  and 
retrieval  (a  complete  data  sate  can  be  as  big  as  6  Gb.  worth  of  image  data),  as  well  as 
the  interpretation  of  analysis  data  from  hundreds  of  thousands  of  nuclei  at  a  time.  Both 
problems  will  be  solved  using  morphology  as  a  tool  for  both  storage  and  retrieval  of 
image  and  analytical  data.  Therefore  we  will  use  morphological  cues  for  both  retrieval  of 
information  and  visualization  of  the  analysis  results. 

As  stated  in  last  year’s  report,  the  relevance  of  what  we  have  achieved  is  quite 
significant,  since  our  system  can  and  will  eventually  be  applied  to  a  variety  of  studies, 
both  molecular  and  clinical,  which  will  greatly  profit  from  the  additional  -third-  dimension. 

The  interest  of  this  system  as  a  tool  for  whole-gland  studies  is  somewhat  proven 
by  the  fact  that  a  new  project  has  been  recently  funded  to  use  this  platform  for  studying 
the  evolution  of  mammary  tumors  in  a  mouse  model. 
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Figure  1.  Background  correction.  A.  Original  image  showing  a  distorting  pattern  due  to  uneven 
illumination  of  the  light  source.  B.  Corrected  image.  C.  Zoom  in  an  area  of  A,  D.  Corrected  zoomed 


image. 
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Figure  2.  Tissue  structure  segmentation  using  scale-space  approach.  A.  Original  image 
from  a  section  of  normal  paraffin  embedded,  H&E  stained  mammary  gland.  B.  Results  of 
the  initial  segmentation.  The  black  boundaries  are  the  result  of  the  automatic 
segmentation  method  described  in  the  text,  superimposed  on  the  original  image.  These 
results  can  be  used  a  the  initial  condition  for  subsequent,  more  refined  segmentation 
techniques. 
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Figure  3.  Tissue  segmentation  using  Level  Sets.  A.  View  of  the  interface.  The  green 
square  in  the  top  left  window  is  interactively  defined  by  the  user.  That  is  the  image  space 
where  the  segmentation  will  be  done.  B.  Interactive  window  for  introducing  the 
parameters  of  the  level  set  algorithm. 
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Figure  4.  Tissue  segmentation  using  Level  Sets  (cont.).  The  example  uses  a  section 
from  an  H&E  stained  tissue  block  of  ductal  carcinoma  in  situ  (DCIS)  of  the  breast.  A. 
Original  region  of  interest  with  the  seeds  of  the  level  set  flow  (blue  pixels).  B.  Automatic 
segmentation  results.  C.  Results  of  the  segmentation  incorporated  to  its  section.  Our 
software  provides  the  user  with  some  interactive  tools  for  correcting  the  segmentation 
results,  whenever  needed. 
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Figure  5.  Tissue  segmentation  using  Level  Sets  (cont.).  The  example  uses  a  section 
from  an  H&E  stained  tissue  block  of  a  mice  mammary  gland.  A.  Original  region  of 
interest  with  the  seeds  of  the  level  set  flow  (blue  pixels).  B.  Automatic  segmentation 
results.  C.  Results  of  the  segmentation  incorporated  to  its  original  section. 
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Figure  6.  Analysis  of  high  magnification  areas.  A.  Original,  DAPI  counterstained  image. 
B.  Background  map  calculated  by  polynomial  fitting.  C.  Background  map  (stretched).  D. 
Original  image  after  background  map  subtraction.  E.  Binary  image  obtained  by  multistep 
segmentation  of  the  corrected  image  (see  text  for  details).  F.  Final  result  of  the  entire 
nuclear  segmentation  process  (see  text  for  details)  superimposed  on  the  original  image. 
G.  FISH  channel  1 ,  corresponding  to  the  hybridization  of  the  centromere  of  chromosomo 
1.  Original  image.  H.  FISH  channel  to,  corresponding  to  the  hybridization  of  a  probe  to 
the  erbb2  gene.  I.  Color  coded  image  showing  the  number  of  copies  of  the  probe  used  in 
image  G.  J.  Color  coded  image  showing  the  number  of  copies  of  the  probe  used  in 
image  H. 
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